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Abstract — Can compression algorithms be employed for re- 
covering signals from their underdetermined set of linear mea- 
surements? Addressing this question is the first step towards 
applying compression algorithms for compressed sensing (CS). In 
this paper, we consider a family of compression algorithms Cr, 
parametrized by rate R, for a compact class of signals Q C IR". 
The set of natural images and JPEG2000 at different rates are 
examples of Q and Cr, respectively. We establish a connection 
between the rate-distortion performance of Cr, and the number 
of linear measurement required for successful recovery in CS. 
We then propose compressible signal pursuit (CSP) algorithm 
and prove that, with high probability, it accurately and robustly 
recovers signals from an underdetermined set of linear measure- 
ments. We also explore the performance of CSP in the recovery 
of infinite dimensional signals. Exploring approximations or 
simplifications of CSP, which is computationally demanding, is 
left for the future research. 



I. Introduction 

The field of compressed sensing (CS) was established on a 
keen observation that if a signal is sparse in a certain basis it 
can be recovered from far fewer random linear measurements 
than its ambient dimension |[T], ||2]. In the last decade, CS 
recovery algorithms have evolved to capture more complicated 
signal structures such as group sparsity, atomic structure, and 
nuclear norm minimization ||3l- l|T9l . In this paper, we consider 
a different type of structure based on compression algorithms. 
Suppose that a class of signals can be "efficiently" compressed 
by a compression algorithm. Intuitively speaking, such classes 
of signals have a certain "structure" that enables the com- 
pression algorithm to represent them with fewer bits. These 
structures are often much more complicated than sparsity, and 
employing them in CS can potentially reduce the number of 
measurements required for signal recovery. 

In this paper, we aim to address the following problem. Is 
it possible to employ compression schemes in the CS problem 
and design algorithms that recover signals, either exactly or 
with "small error", from random linear measurements? As 
we will prove in this paper, the answer to this question 
is affirmative. We propose a CS recovery algorithm based 
on exhaustive search over the set of "compressible" signals, 
that, under certain condition on the rate-distortion function, 
recovers signals from fewer measurements than their ambient 
dimension. This result provides the first theoretical basis for 
using compression algorithms in CS. 

The organization of the paper is as follows. Section re- 
views the main concepts used in this paper and formally states 
the problem addressed in the paper Section |lll] summarizes 
our main contributions. Section HV] extends our results to the 
class of analog signals. Section |V] reviews the related work 
in the literature. Section |VT] includes the proofs of our main 
theorems. Finally, Section FVlIl concludes the paper 



II. Background and problem definition 

In this section we first review the concept of compression 
and rate-distortion function. Then we state the problem we 
address in this paper more formally. 

A. Notation 

Boldfaced letters such as x and X represent vectors. 
Calligraphic letters denote sets. Given a finite set A, \A\ 
denotes its size. The £p-norm of x e R" is defined as 
\P)^'P, The ^o-norm is also defined as 



- {Etl\^^\ 



7^ 0}|. Note that for p < I, 



IS a 



semi-norm since it does not satisfy the triangle inequality. 

B. Rate-distortion function 

Let Q denote a compact subset of M". Consider a com- 
pression algorithm for Q described by encoder and decoder 
mappings {£,!)). Encoder 

f :Q^{1,2,...,2^}, 

maps signal x e Q to codeword £(x). Decoder 

2?:{1,2,...,2^}^ Q, 

maps the coded signal £(x) back to the reconstruction domain 
Q. Let x = P(£(x)) denote the reconstruction of signal x e 
Q. The performance of the described coding scheme at rate 
R is measured in terms of its induced distortion defined as 



D{R) = sup ||x 



^?(f(x))||2. 



Throughout the paper, we usually consider a family of 
fixed-rate compression algorithms {{Eu^'Dji) : R > 0} 
parametrized by the rate R, and its corresponding rate- 
distortion function R{D) defined as 

R{D) = mi{R : D{R) < D}. 

Given compression algorithm {Eji,!)]^), let Cr denote its 
codebook defined as 

Cr ^ {Dr{£r{^)) : X € Q}. 

Note that \Cr\ < 2^. 

In the remaining of this section we illustrate the concepts 



with two examples. Let Bp(p) = {x e R" 



< P} 



represent a ball of radius p in R". Also, let FJ! denote the set 
of all fc-sparse signals in R", i.e., 

F',!^{xeR" : ||x||o<fc}. 

Example 1: There exists a family of compression algo- 
rithms that achieves the following rate-distortion function on 

1 . . / 



R{D) ~ -n log n + n log 



D 
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for D < y/n. Here, c is a constant less than 3. 

Example 2: There exists a family of compression algo- 
rithms that achieves the following rate-distortion function on 

RiD)^log Q+fclog(^^^+cfc, 

where c is a constant less than 3. 

These are classic examples in the literature. However, we 
review their proofs in Section IVI-BI to clarify the concepts 
introduced here. Note that since (p) H C S2 (p)' 
expect to compress (p) H more efficiently. This is 
specially clear as I? | 0. 

C. Problem statement 

Consider the problem of recovering "structured" signal 
X e R" from its undersampled set of linear measurements 
y = Ax, where y e R'', and A e R'^^", d < n, denotes 
the measurement matrix. For various types of structure such 
as sparsity, it is well-known that x may be recovered from 
measurements y even though d < n.ln this paper we explore 
a more elaborate type of structure based on compressibility. 

Instead of being structured as sparse, smooth, etc., suppose 
that the signal belongs to a compact set Q C R" and there 
exists a family of compression algorithms , I?_r) : i? > 0} 
with rate-distortion function R{D) for signals in Q. For in- 
stance, we can consider the JPEG2000 compression algorithm 
II20I at different rates for the class of images. This family of 
compression algorithms might be exploiting the sparsity of 
the signal in a certain domain or any other type of structure. 
The actual mechanism by which the algorithm is compressing 
the signals in Q is not important for the purpose of this 
paper. Instead, we are interested in recovering vector x G Q 
from an undersampled set of linear equations y = Ax by 
employing the compression algorithms {{Sr^'Db) : i? > 0}. 
The question is when this is possible, and what it implies on 
the rate-distortion function R{D). Since for every compact set 
we can define a family of compression algorithms, it seems 
that existence of compression algorithms does not necessarily 
lead to a CS-recovery method. The following lemma confirms 
this intuition. 

Lemma 1: Let Q = ^2(1). If the number of measurements 
is less than the ambient dimension n, any CS-recovery algo- 
rithm will result in an £2 reconstruction error of at least 1, for 
any measurement matrix. 

Proof: Consider any reconstruction algorithm for the 
signals in (1) based on their hnear measurements acquired 
by measurement matrix A G R'^^", with d < n. Let 
Ker(A) = {x : Ax = 0}. Since d < n, Kcr(A) - {0} 7^ 0. 
All signals in Kct{A) n B^ il) are mapped to the all-zero 
measurement vector, and hence the recovery algorithm maps 
all of them to some xq E R". It is straightforward to confirm 
that 

inf sup |lx-xoll2 = l- 

^0 xeKcr(A)nBJ(l) 

In fact the best reconstruction for x G Kcr(A) n '62(1) is 
xo = 0, which leads to supxgKcr(/i)nBj(i) !|x-xo||2 = 1. ■ 



Therefore, the first step in employing compression algo- 
rithms for CS is to characterize the class of compression 
algorithms that can potentially lead to CS-recovery methods. 

Definition 1: Compressed sensing is said to be applicable 
to a compact set Q C R" with d < n measurements, if, for 
any e > 0, there exists a d x n matrix A^ and a recovery 
algorithm Ae, 

such that jlA(Aex) - x||2 < e, for all x G Q. 

According to Lemma [l] CS is not applicable to 62 (1) with 
d measurements, for any d < n. Next we define a-dimension 
for a rate-distortion function and establish its connection with 
CS-applicability. 

Definition 2: Consider compact set Q C R", and a family 
of fixed-rate compression codes, {{Eu^'Dji) : R > 0}, with 
rate-distortion function R{D). Define the high resolution rate 
distortion dimension, or a-dimension, of a family of codes as 

a = limsup TTT- '1) 

D^o log(^) 

In Section |V] we discuss the connection between a- 
dimension and other well-known concepts in information 
theory and functional analysis. 

Consider a compact set Q C R", and without loss of 
generality, assume that G Q. Since the set is compact 

p(Q) ^ sup ||x||2 
x6Q 

is finite. Therefore, Q C S2 (p), and according to Example [T] 
there exists a family of compression algorithms with 

R{D)<n\og(^)+c, (2) 

where p ~ p{Q), D < p, and c is a constant independent 
of the distortion level D. Therefore, for any compact set Q, 
there exists a family of compression codes with a-dimension 
upper-bounded by n. The interesting regime is when there 
exists a family of codes with a-dimension strictly smaller 
than n. For instance, the set of fc-sparse signals in ^2(1), 
discussed in Example |2] is a set for which there exists a family 
of compression algorithms with a-dimension smaller than n. 
In the remaining of this section, we explore the connection 
between the CS-applicability of a compact set Q and the a- 
dimension of a family of codes for Q. The question is whether 
CS is applicable to Q with number of measurements d < n, 
and if the answer is affirmative, what is the minimum number 
of measurements for this result to hold. 

Given a > 0, let denote the set of all subsets of 82(1) 
for which there exists a family of compression algorithms with 
a-dimension upper-bounded by a. For each Q G S^, define 
dmin{Q) as the minimum number of measurements for which 
CS is applicable to Q. The following theorem provides a lower 
bound for the number of measurements. 

Theorem 1: If CS is applicable to any element of Sa.n with 
d measurements, then d > [aj . In other words, 

sup (imin(Q) > L"J- 
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Proof: Set k = \a\ and define as the set of vectors in 
^2(1), whose n — k final coordinates are equal to zero, i.e., 

Tk = {x e 82(1) ■■ Xk+l = Xk+2 = . ■ . = Xn = 0}. 

As shown in Example [T] Tk e S^- Also a very simple modifi- 
cation of Lemma[T]proves that if the number of measurements 
is less than fc, then the reconstruction error of any recovery 
algorithm exceeds 1. Therefore, CS will not be applicable to 
Tk. ■ 

Note that the notion of CS-applicability with d measure- 
ments is the minimal requirement for the practical use of CS. 
In particular, it does not require the robustness of the algorithm 
to measurement noise. Also, the measurement matrix can be 
adapted to the structure of data and the recovery algorithm 
can exploit any extra information about the set. In the rest 
of this paper, we show that considering random measurement 
matrices (nonadaptive measurements) and following Occam's 
principle results in an accurate and stable recovery algorithm. 

Our algorithm searches over the space of "compressible 
signals" and finds the one that matches the measurements the 
best. More formally, given a compression algorithm {Eh^Vb) 
with codebook Cr, consider compressible signal pursuit (CSP) 
algorithm for recovering Xo € Q from its measurements 
Yo = Axo defined as 



arg mm ||yo 



Ac\\l 



(3) 



Here the rate R can be considered as a free parameter, 
which, as we will see in Section HUl plays a role in the tradeoff 
between the success probability of CSP, and its reconstruction 
error Note that we still ignore one important aspect of prac- 
tical algorithms and that is "computational complexity". CSP 
is based on an exhaustive search and hence is computationally 
very demanding. Practical implementation of such ideas is left 
for future research. 

III. Main contributions 

Consider the problem of recovering signal Xq G Q C R", 
from d < n linear measurements yo = Axo + z, where the 
entries of A are i.i.d. J\f{0, 1), and z G represents the 
measurements noise in the system. Furthermore, assume that 
there exists a family of compression algorithms, {{EjijVji) : 
R > 0}, for the signals of Q, which has rate-distortion 
function R{D). We employ the CSP algorithm described in 
^ to recover Xq from yo. 

A. Noiseless measurements 

Our first result is concerned with the performance of the 
CSP algorithm, when there is no noise in the system, i.e., 
z = 0. 

Theorem 2: Consider compression code for set Q 

operating at rate R and distortion D. Let A g R'^^", where 
Ai^j are i.i.d. A/'(0, 1). For Xo G Q, let Xo denote the 
reconstruction of Xq from yo = Axq, A E R"*^", by the 
CSP algorithm employing {£,!)). Then, 



Xo - X0II2 < D 



i + n 



T2 



with probability at least 

1 _ 2«e^(''=+'°s(l-T2)) _ g-|(ri-log(l+ri))_ 

Theorem |2] proves that in many cases CSP algorithm 
provides an "accurate estimate" of Xo with a number of 
measurements that is less than the ambient dimension n. 
Corollary [T] below describes one instance of such cases. 



Corollary 1: Let D < c ^ and d > 



AR(D) 
log(l/cL>) 



Then 



P(||xo-Xo||2 > V2D)<e 



_j_ e 8 1og(l/c-D) , 



Proof: 

Let r2 = 1 — in Theorem |2] Then, 



If d> 



D 



AB.(D) 



log(l/c_D) 



l + Ti 
I-T2 



then 



D 



1+Tl 



D 



< V2D. 



(4) 



i?(i?)log2 + -(r2+log(l-r2)) 



= R{D) log 2 

< R{D) log 2 

< R{D) log 2 
<-R{D). 



d 



{l-D + \ogD) 



2R{D) 
log(l/ei?) 

2R{D) 
log{l/cD) 



{l-D + \ogD) 
logicD) 



Hence, 



2fle|(r2+log(l-T2)) < Q-RiD)^ 

On the other hand, for ri < 1, 

n - log(l + ri) > t2/4. 

Therefore, 

-(n-log(l + n))>^j^^^. 

Set Ti = 0.5. Then, 

d r 1^11 ^^ -B(D) 



(5) 



(6) 



(7) 



The desired result follows from combining the bounds in (2), 
© and (Q. ■ 



Note that if £> <C 1 then 



R{D) 



is much smaller 



log(l/eD) 

than R{D) itself. In fact, according to (|2]i as — > 0, 
limsupj)_j.Q log^x £ n. The following corollary charac- 
terizes the number of measurements required and probability 
of correct recovery as a function of a-dimension. 

Corollary 2: Consider a family of rate-distortion codes 
{{ERjViij : R > 0} with a-dimension a. Then for every 
e > 0, there exists R > and a corresponding code {£r, Vr), 
such that if we employ this code in the CSP algorithm with 



d > 4a 



measurements, then 



P(||xo-Xo!|2>e)<e- 



4 



Corollary |2] directly follows from Corollary [T] by taking the 
limit as D 0. Example |2] shows an application of this 
corollary. 

Example 3: Consider the class of fc-sparse signals in R" 
discussed in Example |2] It is straightforward to check that 
there exists a family of codes such that R{D) / \og{l / D) I k 
as D I 0. Therefore, from Corollary |2] for every e > 0, the 
solution of the CSP algorithm satisfies 

P(||xo-x„||2>e)<c-0•l^ 

if d> 4fc. 

By choosing different values for ti and T2 we can derive 
different upper bounds for the recovery error Here is another 
instance of such result. 

Corollary 3: Taking the number of measurements d such 
that d > 4:R{D)/logn, then 

P f ^i|x, - x„||2 < V2d) < e-«(^) + e-S. 



Proof: Choose T2 = 1 — 1/n in Theorem |2] Then, if 
d > 4:R{D)/ logn and n is large enough, 

log 2 +^(T2+log(l-T2)) 

= log2 + ^^^^(1 - - logn) 

logn 

< -R{D). 
Also, choosing ti = 0.5, it follows that 

2(n-log(i + n))>^, 



and 



1-7-2 



(8) 



As a final remark, we derive a lower bound for the number 
of measurements required according to Theorem [T] for the 
"success" of the CSP algorithm. Consider the success proba- 
bility in Theorem [T] To keep success probability larger than 
zero we require 



i?l0g(2) + -(Tl-l0g(l-Tl)) <0 



(9) 



Therefore, to reduce the number of measurement, we require 
Ti to be large. But ti is always less than 1. Furthermore, 
if Ti > 1 — D^, the upper bound on the reconstruction error 
will be larger than 1, which is a trivial bound for any signal in 
52(1)- Hence, we consider ti < 1 — If we set n = 1 — 
in ®, we obtain d > 2^^)^^)^ - 

B. Noisy measurements 

An inevitable part of any measurement system is noise. In 
this section we analyze the performance of CSP in the presence 
of noise. Consider the case where the linear measurements 
are corrupted by i.i.d. noise, i.e., yo = Ax-o + z, where z; ^ 
A/'(0, (T^), i — 1, . . . , d To recover signal Xo from yo, again 
we employ the CSP algorithm described in (|3]l. 



Theorem 3: Let Xq denote the solution of CSP to input y^, 
using fixed-rate code [S^V) for compact set Q, which operates 
at rate R and distortion D < (5c)^^. For any Xq G Q and 
T] > 1, choosing 



log 



1 

cD 



then 



||Xo-Xo|l2 < 



D^] ° cD 
with probability exceeding 



log— + J2i:'+ — + — W f — 



1 _ e log i/(cD) _ 

_ „-(0.8>)-log2)i?^ 



0.6rjR 
2e logl/(oI3) 



-(2»,-l)i?, 



(10) 



Note that D (or equivalently R) acts as the free parameter 
of the CSP algorithm and control the bias and variance of the 
final estimate. Intuitively speaking small values of D lead to 
large variance since <t is divided by D in two terms. Large 
values of D make the variance small, but increase the bias (due 
to 2D term under the radical sign). The optimal choice of the 
free parameter R is dictated by the rate-distortion performance 
of the code, number of measurements, and the variance of the 
noise. 

IV. Extension to analog signals 

So far, we have considered the problem of recovering 
finite-dimensional signals from their undersampled set of 
linear measurements. But the framework we have developed 
is applicable to infinite-dimensional spaces as well. In this 
section, we extend our results to recovering continuous-time 
function / ; [0, 1] IR from a finite number of random linear 
measurements. In this section, we first review the related basic 
concepts required for analyzing continuous-time functions and 
then propose a recovery algorithm similar to CSP for such 
signals. 

A. Ito's integral 

For continuous-time signals, we consider a measurement 
that is based on the Wiener process. Wiener process W{t), 
a.k.a. Brownian motion, is a continuous time process that 
satisfies the following four properties: 

1) P(W,(0)=0) = 1. 

2) The probability that a randomly generated path to be 
continuous is equal to 1. 

3) W,{t) - Wi(s) = N{0,t-s), for < s < i < 1. 

4) For < si < ti < S2 < <2, — Ws^ is independent 
of Wt.-Ws,. 

This process is a key component of stochastic calculus and 
stochastic differential equations. In particular, Ito's integral, 
which plays a central role in stochastic differential equations, 
is defined based on the Wiener process. To keep our discus- 
sions simple, we introduce a specific form of the Ito's integral 
that is used in this paper 

For function / : [0, 1] R, define its p-norm as 

i/p 



\\f\\p = 



\f{t)\Pdt 
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Furthermore, define ip([0, 1]) as the set of functions from 
[0, 1] to R with finite p norm, i.e., 

ip([0, 1]) = {./ : [0, 1] ^ R I !|/!|p<cx)}. 

In this paper we are mainly interested in L2([0,l]), which 
includes the set of functions with finite second moment. Also, 
for some technicalities that become clear later on, we restrict 
our attention to subset L2{\Q, 1]) of L2([0, 1]) that is defined as 
the set of function in i2([0, 1]) that are piecewise continuous 
with a finite number of discontinuities. 

Suppose that fs e i2([0i 1]) is ^ simple function, i.e., fs 
can be represented as 



N 



t6(tfc,tfc+l] ' 



fe=l 



where Q ~ ti < t2 < ■ ■ ■ < tjq = 1, and (ci, . . . , cm) S R 
For such functions, Ito's stochastic integral is defined as 



N 



N 

/,,(t)dVF(t)^^Cfc(W^(ifc+i) 

k=l 



Witk)). 



Note that since iW{U+i) - W{U) : i ^ 0,1,...,N ~ 1) 
are independent Gaussian random variables, the result of 
this integral is a Gaussian random variable with mean zero 
and variance J2k=i^li^k+i - tk). For / G LI{[0,1]), let 
(/i, /2, • • •) be a sequence of simple functions such that 



lim 



(fit) - Ut)fdt = 0. 



Then the Ito's integral of / is defined as 

.1 .1 

f{t)dW{t)^ lim / fn{t)dWit), 
Jo 

where the convergence is in the mean square sense. As we will 
prove in Section IVI-EI Ito's integral is defined for any function 
in L2{[0,1]). We will also show that the result of the integral 
is a Gaussian random variable. 



B. Rate-distortion function 

Consider a class of functions T C L2{[0, 1]), and a family 
of compression algorithms {{EjijVii) : R > 0}, indexed by 
rate R. For each code in this family, the encoder and decoder 
mappings, {£^,1)^), are defined as 



and 



fi? :^^{1,2,...,2^}, 



I?fl:{l,2,...,2«}^^, 



respectively, where T C i2([0,l]) denotes the class of 
reconstruction functions. For a function / e 'Dj^{£B.{f)) 
denotes the reconstruction of function / by the code {£r, 'Dr). 
Given compression algorithm {£r, T>r), let Cr denote its 
codebook defined as 

Cr ^ {Vr{£rW) -.feT}. 



The distortion-rate function of this family of codes is defined 
as 

D{R)^snv\\f-VRi£R{f))h. 



C. Compressed sensing of analog signals 

1) Measurement process: Unlike the classical compressed 
sensing setup, where the measurement process is assumed to 
be in the discrete time domain, here we consider analog do- 
main measurements. In particular, for function /o G -^2([0i 1])' 
we consider d linear measurements of the form 



fo{t)dW,{t), for ^ = l,2, 



(11) 



where Wi, i = 1, 2, . . . , d, are independent Wiener processes. 
Similar to the discrete time settings, each measurement is a 
random linear combination of the signal at different times. 
As we will show in this section, this type of measurement 
process ensures that with "sufficient" number of measurement, 
the "critical information" about the signal is acquired by the 
measurements, and therefore, we can recover fo from the 
measurement vector y e R''. 

2) CSP algorithm: Consider {{£r,Vr) : i? > 0} a 
family of compression algorithms for class of functions 
C LjdOil]) with rate-distortion function D{R). We are 
interested in recovering a function foEJ-C L2{[0, 1]) from 
d linear measurements, yo G R" such that, for i ~ 1, . . . ,d. 



fo{t)dW,{t). 



(12) 



Let A : L2([0, 1]) — > R'' denote the just-defined Unear 
measurement process, i.e., yo = -4(/o)- To recover the 
function fo from y^ we employ the CSP algorithm defined 

as 



fo = argmax||yo - Aif)\\l. 
feCR 



(13) 



The intuition for the CSP algorithm is the same as what 
we proposed before; among all the low-complexity signals 
(defined according to the compression algorithm) look for the 
one that matches the measurements the best. The parameter R 
can be considered as a free parameter in the algorithm, whose 
role will be clear in the next section. 

3) Performance guarantees for CSP: Consider the problem 
of recovering function fo E J- C ^2(10' 1]) from its under- 
sampled set of d random linear measurements, yo = A{fo), 
as defined in (fTZt . Assume that there exists a family of 
compression algorithms for T indexed by R, {£,!)), that 
achieves the rate distortion function R{D). We employ the 
CSP algorithm to recover fo- The following theorem charac- 
terizes the performance of the CSP algorithm. 

Theorem 4: For fo G J-, let fo denote the reconstruction of 
fo from yo = A{fo), by the CSP algorithm employing rate-i? 
compression algorithm {£,T>). Then, 



\\fo-foh<D 



1+Ti 
1-T2^ 
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with probability at least 

1 - 2^C^(''"+'°s(l-r2)) _ (,-|(ri-log(l+Ti))_ 

See Section IVI-EI for the proof. 

Theorem |4] considers noiseless measurements. In the case 
of noisy measurements, assume that 
1-1 



fo{t)dW,{t) + Z„ 



where Zi ~ N{0,a'^) represents the measurement noise. As 
the following result shows, even in this infinite-dimensional 
setting, the algorithm is robust to noise. 

Theorem 5: Let /o denote the solution of CSP to input yo, 
using code {E^T)) at rate R and distortion D for compact set 
F. For distortion D < (5e)^^, and for any fo G Q and 77 > 1, 
choosing 



logs? 



then 



ll/o-./o|i2< 



log 



1 



° eD 

with probability exceeding 



2(7 cr2 1 



1 _ e iogi/(oD) _ 

_ „-(0.87,-log2)i?. 



_ O.GrjR 
2e logl/(cI3) 

„-0.3fl. 



^(2»,-l)K 



(14) 



After employing Lemmas |4] and |5] the proof of this result is 
similar to the proof of Theorem |3] Hence we skip the proof. 

Note that according Comparing Theorems |2] and |3] with 
Theorem |4] and |5] may lead us to a conclusion that the ambient 
dimension of the signal is not important in the performance 
of CSP algorithm. However, this conclusion is not true. For 
further information regarding this issue see Section IIV-EI 

D. Applications 

In this section, we investigate the implications of Theorem 
|4] for three different classes of continuous time signals. As 
we will see in these examples, in case of infinite dimensional 
signals, the rate-distortion performance shows more diverse 
types of behavior Different rate distortion behaviors of these 
classes clarify the opportunities and limitations of analog CS. 

Let V^{A) denote the class of piecewise polynomial func- 
tions with N, Q, A representing the maximum degree of 
the polynomials, number of singularity pointfl and maximum 
value of the function, respectively. 

Example 4: There exists a family of compression algo- 
rithms for 'P^{A) that achieves 



R{D) = {N + 3){Q + I) log i^- I +c, 

where c is a constant independent of D f2T\. 

The rate-distortion behavior described in Example |4] for 

V^{A) is reminiscent of the rate-distortion function of finite- 

' Singularity point is a point at wliicli tlie signal is not infinitely differen- 
tiable 



dimensional spaces. However, since the locations of singu- 
larities are not fixed, the space is not finite dimensional. 
Nevertheless, we would expect to recover the signals of this 
class with finite number of measurements with small error. 

Corollary 4: For every e > 0, there exists R > and a 
corresponding code {£fj, Vb) at rate R, such that if we employ 
this code in the CSP algorithm with 

d>4(iV + 3)(Q + l) 

measurements, then for any Jo G V'^{A), the reconstruction 
error satisfies 



P(||/o-./o||2>e)<e- 



-0.1(JV+3)(Q+1) 



Due to the similarity of Theorem |4] and Theorem |2] the proof 
of this corollary is exactly the same as the proof of Lemma [T] 

For finite-dimensional spaces we could show that for 
any compact subset of R" there exists a compression al- 
gorithm whose rate distortion function satisfies R{D) = 
O (nlog (-^)). However, in infinite dimensional spaces this 
is not the case any more. The next example illustrates a class 
with a slightly different rate-distortion behavior 

Let Hh{C) be a class of functions / : C — > C satisfying 
the following properties: 

a) / is analytic on a strip of size h, i.e., f{z) is analytic on 
{z = a; + I \y\< h}. 

b) |/(z)| is bounded by C. 

Define Gh{C) as 

GhiC) ^ {,g : [0, 1] ^ R : 3 / e nh{C),g{x) = \f{x+iQ)\]- 

Example 5: There exists a family of compression algo- 
rithms for Qh{C) that achives 



R{D) < c lo. 



1 

15 

where c is a constant that does not depend on D 
Clearly for this class of functions the a-dimension is infinite, 
therefore we do not expect results similar to Corollary 2] to 
hold for this class. However CSP algorithm is still useful for 
this class as is described in the next corollary. 

Corollary 5: Let D < 1 and d > Aclog{l/D). Then 

P(||io - >V2D)< c-i°s^(Vc) + g-^iHsom^ 

The proof is similar to the proof of Corollary [T] 

While the number of required measurements tends to infin- 
ity, as the distortion goes to zero, it only grows logarithmically 
with the distortion. Therefore, intuitively speaking, accurate 
estimates are still obtained from few measurements. 

If the class of functions is too rich (less structured), then 
the growth rate of rate-distortion will be faster and therefore 
to obtain reasonably accurate reconstruction we may require 
many observations. Here, we present one such example. 

Consider the class of smooth functions for which the coders 
require higher bit-rates to achieve the same distortion level. Let 
H°'{C) be the class of real functions / : [0, 1] — > IR, having 
derivative of order a in (in the sense of Riemann-Liouville) 
uniformly bounded by some constant C. 



7 



Example 6: There exists a family of compression algo- 
rithms for H°'{C) that achieves 

1 

1 



where c is a constant independent of D 
According to Theorem ID having 0( ipg^i/j) ) measure- 
ments, the reconstruction error is bounded by yj2D. But 
0{ \og{i/D) ) much larger than the number of bits that 
are required for achieving distortion \J2D, i.e., 0( „i/(2e<) )■ 



. Dl/(2a 

Therefore, in such cases, CSP algorithm is not particularly 
interesting. In fact the number of measurements required 
grows rapidly as we decrease the distortion compared to other 
classes. 

E. Discussion 

The results we have discussed so far are the same for finite 
and infinite dimensional classes. Such results may mislead us 
to a conclusion that as long as the performance of CSP is 
concerned the ambient dimension is not important. However, 
the finiteness of ambient dimension may help derive stronger 
results. Next theorem is an instance of such results. 

Theorem 6: Let A E R''^" be a measurement matrix with 
Aij iV(0, 1). For any Xq G Q, we denote the reconstruction 
of the CSP algorithm at rate R with Xq. We have 



P Vxo e Q : i|xo-io||2 > 



1 - r 



+ (1 + 01? 



See Section IVI-FI for the proof. 

Note that there is a major difference between Theorem 
|6] and Theorem |2] Theorem |6] claims that once we draw a 
random matrix from Gaussian distribution, this matrix with 
high probability works for any signal in Q. However, Theorem 
|2] considers individual sequences. Note that the strength of 
Theorem |6] has come at a price of larger reconstruction error 
and lower success probability. 

V. Related work 

A. Connection of compression and compressed sensing 

In this paper we consider the problem of using a family 
of compression algorithms for compressed sensing. The other 
direction, i.e., using CS for compression have also been 
extensively studied in the literature ll23l - ll3n . In this line of 
work the rate-distortion that is achieved by scaler (or in a few 
cases adaptive) quantization of random linear measurements 
has been derived. However, such results are different from our 
work since they only consider either sparse or approximately 
sparse signals. Furthermore, we consider a different direction, 
that is, the direction of deriving CS recovery algorithms based 
on compression schemes. 

B. Kolmogorov 's e-entropy and compressed sensing 
The e-entropy of a compact set Q is defined as 



where N^{Q) is the minimum number of elements in an e- 
covering of Q f22\. The e-entropy, H^, provides a lower bound 
on the rate distortion of any family of compression algorithms. 
In other words, if R{D) is the rate distortion function of a 
family of compression algorithms on Q, then 

R{D) > Hd{Q). 

It is clear that our results can be stated in terms of Kol- 
mogorov's e-entropy by considering it as the optimal compres- 
sion scheme from the perspective of rate-distortion tradeoff. 
In particular. Corollary |2] leads to the following result: 

Corollary 6: Consider compact set Q C R". For e > 0, 
let Ce denote an e-covering of Q C R" such that log \Ce \ — 
Ht{Q), and assume that 



e^o log(l/e) 



< a. 



For Xo G Q, and e > 0, let Xo,, 
CSP employing Ce, i.e., 

Xo.e = argmin 



denote the reconstruction of 



Mc - Axn 



If d > 4a, then, choosing e small enough. 



> 



The quantity 



lim sup 



:o"log(l/e)' 

called upper metric dimension [|22l or Minkowski dimension 
132.1 . Metric dimension is a measure of the massiveness of 
compact sets in finite dimensional spaces [22 1. The connection 
between Minkowski dimension and CS has also been explored 
in the stochastic settings that will be reviewed in Section [V-CI 

C. Stochastic settings 

This paper considers a deterministic signal model. However, 
stochastic settings have also been considered in CS |[33l - 
143)1 . In such models the data is assumed to follow a certain 
distribution (often i.i.d.) and the probability of correct recovery 
is measured as the ambient dimension tends to infinity. In 
many cases the algorithms exhibit certain phase transitions 
in the probability of correct recovery. Such phase transitions 
have been characterized in certain cases either theoretically or 
empirically 1221, ED, l40l-||42l. 

The most relevant to our work are POl , ll42l . These 
two papers characterize the performance of "information- 
theoretically" optimal algorithms in the asymptotic setting. 
For instance they prove that the number of measurements that 
are required for "exact" recovery is the same as the Renyi 
information dimension. Even though there is an interesting 
connection between Renyi information dimension and metric 
dimension l44l . there are several major differences between 
our work and the work of ll40l . Il42l . First our framework 
is concerned with the deterministic signal models. Second, 
our results are for finite-dimensional signals, and are non- 
asymptotic. Third, we consider arbitrary family of compres- 
sion algorithms and characterizes when such schemes can be 
used for signal recovery from random linear measurements. 
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D. Kolmogorov complexity 

Our work is mainly inspired by series of work on the 
connection between Kolmogorov complexity of sequences 
and CS 1451 - 15 H . In particular, B31 defines the Kolmogorov 
information dimension of x = {xi,X2, ■ ■ ■ ,Xn) S [0,1]" at 
resolution m as 



■ 1 •^n 



where intuitively speaking, K^'^™-{xi^ X2, ■ ■ ■ , Xn) denotes the 
Kolmogorov complexity of vector x where each component 
is quantized by m bits, and proves that if the Kolmogorov 
information dimension of a sequence is small compared to its 
ambient dimension one can recover it from an undersampled 
set of linear measurements. Our results have several con- 
nections with ||451 . The proof techniques we use here have 
similarities to the proof techniques used in P31 . However, not 
only the problems are different, but also there are some major 
differences. One example is the continuous time domain that 
requires different treatment of the problem. Also, our result 
is the first step in a new direction toward practical implemen- 
tation of P31 . While the CSP algorithm is computationally 
demanding at this point, it provides an approach to designing 
sub-optimal algorithms such as greedy methods. Furthermore, 
CSP algorithm may enable us to employ universal compres- 
sion algorithms 1521 . If53l and develop universal compressed 
sensing methods. This has been the main goal of 1451 . BtI - 

ED 



VI. Proofs 



A. Background 



We use the following two lemmas from lISTl throughout our 
proofs. 

Lemma 2 ('x^ -concentration): Fix r > 0, and let Zi ^ 
7V(0, 1), i = l,2,...,d. Then, 

P Z,f < d(l - r)) < e^(^+'°s(i-r)) 



and 



(15) 



Lemma 3: Let X and Y denote two independent Gaussian 
vectors of length n with i.i.d. elements. Further, assume that 
foi i ^ l,...,n, X, ^ A/'(0,1) and Y, - 7V(0, 1). Then 
the distribution of X^Y — J27=i -^i^i is the same as the 
distribution of ||X||2G', where G ~ 1) is independent of 

l|X||2. 



B. Calculation of rate-distortion function 

In this section we briefly summarize the proof of Example 
1 and Example 2. 



1) Proof of Example 1: For notational simplicity we set p ~ 
1. Finding a compression algorithm for (1) equivalent 
to covering 82(1) with ^2-balls of radius D. Consider the 
following grid points for the interval [—1,1]: 



IT 



D 



D 



D 



D 



D 



It is straightforward to show that ^2-balls of radius D with 
centers on 

a„ = ^1 X ^1 X ...,xgi. 

n 

covers the entire space /S2 (1). Therefore, our compression 
scheme maps each vector to its closest codeword, i.e., 

2?(£(x)) = arg mill ||z - x||2. 
zed 

If the minimizer is not unique, the compression algorithm 
chooses one of the minimizers at random. The rate such 
compression algorithm achieves is equal to 



< 



< 



log- 
log 
log 











D 





D 



D 



nlog(v^) + nlog 



nlog(5). 



2} Proof of Example2: Our encoding scheme is inspired by 
the previous example. The space of all fc-sparse signals has (^) 
hyperplanes. Once we specify the hyperplane H, 'HnSJ (1) is 
an ^2-ball of radius 1 in fc-dimensional subspace. Therefore, 
according to Example [T] we require 



ck 



bits to code it with distortion smaller than D in a specified 
subspace. Therefore, overall we require log (j^') for coding the 

subspace, and log (^^^^ for specifying the codeword on 
each hyperplane. This proves 

k 




R(D) = log 




ck. 



C. Noiseless measurements 

Proof of Theorem |2} Let Xq G Q, Yo = ^Xq, Xq ~ 
P(£(xo)), and Xo = argmin^gg ||yo - Ac\\l. 

Since Xq minimizes the measurement noise ||yo — Acjjj over 
all c G C, we have 



< ||AXo - AXo\\2 

= lIXo - Xolhll^UijU, 



(16) 



where 



Ul 



A(Xo - Xq) 

||Xo - X0II2 
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Since the entries of A are i.i.d. Gaussian, Ui is a vector 
of d independent zero-mean Gaussian random variables with 
variance 1. For ri > 0, define event 

£i^{||ui||2<d(l + ri)}. 

By Lemma [21 

P{£1) = P(||ui||2 > d{l + ri)) < c-7(^i-i°g(i+^i)). 

Since Xq is the reconstruction of Xq using the rate-distortion 
code {£,!)), it follows that ||xo — Xolb < D. Therefore, 
conditioned on £i. 



(17) 



To find a lower bound on j|yo — Axolb, note that for a fixed 
c € C, 

j|yo-.4c||2 = P(xo-c)||2 

= ||Xo - Cl|2i|u2i|2, 



where 



U2 



^(Xo - C) 



|Xo - C||2 

Similar to ui, U2 is a d-dimensional distributed as J\f{0,ld)- 
Note that U2 depends on c. For T2 G (0, 1), define event £2 
as 

£2 = {VceC:l|u2|| >d(l-r2)}. 
By Lemma |2] and the union bound, it follows that 

P{£^)<J2{\\nl\\<dil-r.2)} 



cec 



(18) 



Combining the two events, conditioned on that £1 and £2 both 
hold. 



v/d(l - T2)||xo - x„!|2 < D^dil + ri), (19) 
or equivalently. 



Ixo - X0II2 < D 



1+Tl 



1-T2 

Finally, by the union bound, 

P(£inf2) = i-Pi£',u£^) 



(20) 



D. Noisy measurements 

Proof of Theorem |5} Let Xq = V{£{yio))- Since 
by assumption the code operates at distortion D, we have 
||xo — Xoll < Z?. On the other hand, since Xq is the solution 
of ©, 



\Yo - AX0II2 = P(Xo - Xo) + Z|l2 
< ||A(Xo -io) +Z||2. 



Expanding both sides of ( ISTT i and canceling the common terms, 
we obtain 



Let 



and 



||A(Xo - ko)\\l + 2z^A(Xo - io) 

< ||A(xo - + 2z^A(xo - Xo) 

^ j4(Xo — Xo) 
||Xo - X0II2 

A ^(Xo ^ Xo) 
U2 — 



(22) 



l|Xo - X0II2 

Using this definition, along with triangle inequality and a < 
\a\, we rewrite (|22] | as 



jXo - X0II2IIU2II2 - 2||Xo - Xo||2|z^U2 
2||,, l|2 I oik, II l„T, 



< ||Xo-Xo||^||ui||^+2||Xo-Xo||2|z^Ui|. (23) 

For Ti > and T2 £ (0, 1), define events £1 and £2 as 

£1 ^ {p(Xo-i„)||2 < (l+ri)d||Xo-io||2}. 

and 

£2 = {V C e C : P(Xo - C)!!^ > d{l - T2)||Xo - c||2}. 

Conditioned on £1 n £2, we can upper bound ||ui||2 by 
c?(l + Ti), and lower bound 1 1 U2 1 1 2 by a/ c?(1 — T2). 
In order to bound |z^Ui| and |z^U2|, we employ Lemma 
|3] Given Xo, Xo and Xo, both Ui and U2 are i.i.d. Gaussian 
vectors with mean zero and variance one, and are both 
independent of z. Therefore, by Lemma |3] z^Ui and z^U2 
are respectively distributed as ||z||2G'i and |jz||2G'2, where Gi 
and G2 are zero-mean variance one Gaussian random variables 
independent of ||zj|2. For 71,72 > 1, define events £3 and £4 
as: 

£3 = {|z^ui| < jiaVd}, 

and 

£■4 ^ {V c e C : |z^U2| < j2aVd}. 
As argued above, 

P(£3^)=P(||z||2|Gi|>7i^\/rf) 



(21) 



= P(||z|l2|Gi| >7ifTVd,||z||2 ><7v/d(l + r3)) 

+ P(l|z||2|Gi| > ^laVd, ||Z||2 < <J^d{l + T3) ) 

<P(||zi|2 >TVd(l + T3) )+P(|Gi| >7i(l + r3)-"-' ) 

<C-|(^3-log(l+T3)) _^g-7iV2(l+r3)_ (24) 

where > 0, and the last Une follows from Lemma|2] Adding 
the union bound to this analysis, we get 

P(f|) <2^ (e-*(^^-'°s(i+r4)) + e-7l/2(i+r4)^ ^ (25) 

where T4 > 0. 

Conditioned on £^1 n ... H £4, it follows from ( |23] | that 

- r2)||xo - X0II2 - 272cr||xo - X0II2 
- D^{l + Ti)Vd-2D-/ia <0. (26) 

Before, analyzing the roots of (l2&t . consider the probabihty 
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of £i n . . . n £4- By the union bound, parameters values, we derive 



P(fin...n54)>l-P(fD-...-P(f4'). n M^^ II ' 112 o ^11 ^11 

DJ- ^||Xo - X0II2 - 2cr\/4iT!log — ||xo - X0II2 

To make sure that P{£i Ci . . . Ci £4) is close to one for large y 
values of d, it suffices to show that P(£',f) 0, for i — I ^ j 

1,..., 4. By Lemma m - 2D^ J - 2Da J ^ < 0. (32) 

V log 771 V log 771 



P(£0 < c- 



-|(ri-log(l+ri)) 



The quadratic equation ax^ — 2bx — c = 0, with a,b,c > 0, 
and by Lemma |2] and the union bound, has one positive and one negative root. Hence, ||xo - 2 is 



P(£^) < 2^0^*^^^^'°^'-^^'^^^^ smaller than the positive root of (|32> . Therefore, conditioned 

~ on £1 n ... n £4, from ( |32] i. ||xo — X0II2 is upper bounded by 

Upper bounds on P(f|) and P{£!1) are given in (l24l i and 



respectively. Choose r, = ^ 1, r, ^ 1 - D, -f-log^ + J 2D + ^ + ^ log'{^). (33) 



71 



/ 4i? 
1^ 



and 



72 '\/4i?log 



For Ti = 1, 

Similarly, for — 1, from ( |24] | 



6^ Proof of Theorem |?] 

0.677 jt 



P(fr) < e~i°8i/«=o) , Lemma 4: If f G L^([0,1]), then the distribution of 

J^f{t)dW is iV(0,||/||i). 

^ Proof: For notational simplicity we consider a continuous 

^{£3) ^ c~ + . (27) function / > 0. Extension to piecewise continuous functions 

„ „ . , . . ., , ., , . m • , , with finite number of discontinuities is straightforward. Note 

Following an analysis similar to one detailed in oil yields , . , r ■ aha rr^ii a- 

^ ■> ^ ■> that since the functions are denned on [0, IJ, according to 

2 + — fr+l fl T ]] Heine-Cantor theorem, they are also uniformly continuous. For 

og ■^[T2 og[ -T2)) partition V, = to < ti < . . . < t^-i < ti ^ 1, define the 

< (1 - 2t])R, (28) partition length as 

and therefore, S = sup \U — ti-i\. 

rr,, ■ ■ J , .• T^,c^r\ Consider a sequence of partitions {T'l, 7^2, •■• j'Pn, ■• -jsuch 

The remaining probability that we need to bound is P\c\). , 

. , tnat 
For r4 = - log(eL'), 5^ <- 

d " ~ n' 

i? log 2 - -(t4 - log(l + T4)) Q-ygjj jjjg (jjyjsjojj points q ^ < ^^^^^ < ^ < . . . < 

277i? ,1 , 1 o = 1, define the following piecewise constant function: 

<i?log2--^ log— -log 1 + log— 



<(log2-27y)i?+-^log(l + log— ) 7 ^ '-'- - '^ 

log 717 el? 

< (log 2 — 877)1? (29) 1^ clear from the uniform continuity that for every e > 

~ there exists rig such that for every n> 

where the last line holds because by assumption, D < (5c)^^, 

or {De)-^ > 5, and for t > 5, log(l + log(i))/logt < 0.6. I-^'W " -^"Wl < ^- * ^ t^' ^l' 

Also, since logt/ (1 + logt) > 0.5, for t > 5, Therefore, for n > 

< i?(log 2 - 1) < -0.3i?, (30) In other words. 



Therefore, 



1 

2, 



lim / (fit)- f„{t)Ydt = 0. 
P(£4^) <e-(o «''-'°s2)« + e-°-3«. (31) 

Now that we have constructed a sequence of simple functions, 
we have 

Going back to the quadratic equation (l26T l and inserting the / •^'^^ ~ il'Sx) / *^"'^^' 
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It is straightforward to show that 

/ fndW = V f min fit)) (WiU) - WiU^i). 

Which has a Gaussian distribution with mean zero and vari- 
ance E'=i (mint6[t,_i,t,] /(^)) iU - t,_i). As n ^ oo 
this sequence of Gaussian random variables converge to 
A^(0, II/IP)- On the other hand according to Ito's integral 
definition they converge in mean square to J fdW. Therefore, 
/ fdW is distributed as iV(0, \\f\\l). ■ 
Lemma 5: Let / G -^2([0j 1]) ™d consider 



A{f) 



fdWiit),. 



fdWS) 



where Wi,W2, ■ . ■ ,Wd are independent Brownian motions. 
Then ^^'^^^J^^ is a random variable with d degrees of 
freedom. 

Proof: According to Lemma |4] and independence of 
WiS, the elements of A{f) are iid A^(0, H/Hj). Therefore 
-4(/)/||/||2 is iid A^(0, 1) vector, that proves 



M(/)ll 
ll/lli 



xHd). 



Proof: Given Lemmas |5] and |4] the proof is essentially 
the same as the proof of Theorem |2] Therefore, we briefly 
mention the main steps. Let fo ~ 'D{8{f)) and / denote the 
reconstruction of CSP. Since / is the solution of CSP, we 
have 

M(/)-^(/o)ll^<ll-A(/)-^(/o)||^ 

According to Lemma ID both \\A{f) - A{fo)\\y\\f - fo\\2 and 
\\A{f) — A{fo)\\l/\\f - foWl are random variables with d 
degrees of freedom. Therefore it is straightforward to confirm 
that 

P{\\A{f) - A{fo)h > D^d{l + T)) < e-i(--i°g(i+-)), 
and also for every / e Cr 



\\Aif)-A{fo)\\2>\\fo-fo\\2Vdil-r). 

with probability 1 - 2^c'^/2(^+i°s(i-^)). Combining these 
results completes the proof. ■ 



F. Proof of Theorem |6] 

Let Xo e Q, yo = Avlo, Xo = P(£(xo)), and Xo = 
argminj.gg ||yo — AcjU. As before, we have j|yo — ylxolb < 
||yo - Axo\\2- Hence, 



o\\2- 



\\Aiio - Ayioh - PXo - A^oh < - Ai, 
Rearranging the terms proves that 

||AXo - Aiioh < 2\\A^o - Aioh < 2cr,„ax(A)i?, (34) 
where crmax(^) is the maximum singular value of A. Define 

T = {Xi - X2 I Xi G C_R, X2 e Cfi}. 



Define the event fj^"-* as 

£,^{$heT; \\A{h)\\2 < rVdWh^}, (35) 
and, for t > 0, the event fj""* as 

£2 = {cTrnacciA) - Vd - < tVdj . (36) 

Using the union bound and Lemmas |5] |2] it is straightforward 
to confirm that 



(37) 



Finally, using the results on the concentration of Lipschitz 
functions of a Gaussian random vector ||54l . we obtain 

P (£1) = P {<Jrnax{A) - Vd - > tVdJ 

< e-^*'/^ (38) 
Combining (|37] | and ( |38] | with ( l34b finishes the proof. 

VIL Conclusion 

In this paper, we studied the problem of employing a 
family of compression algorithms for compressed sensing, 
i.e., recovering structured signals from their undersampled 
set of random linear measurements. Addressing this problem 
enables CS schemes to exploit complicated structures inte- 
grated in compression algorithms. We proposed compressible 
signal pursuit (CSP) algorithm that outputs the codeword that 
best matches the measurements. We proved that employing 
a family of compression algorithms whose rate-distortion 
function satisfies limsup^^^Q i?(_D)/log(l/D) < a, with a 
smaller than the ambient dimension, with high probability, 
CSP recovers signals from 4a measurements. CSP is also 
applicable to infinite-dimensional signal classes. The CSP 
algorithm is still computationally demanding and requires 
approximation or simplification for practical applications. This 
important direction is left for future research. 
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